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Abstract 


In this study, we report the complete genome sequence of two contemporary human coronavirus OC43 (HCoV-OC43) strains detected in 
2003 and 2004, respectively. Comparative genetic analyses of the circulating strains and the prototype HCoV-OC43 strain (ATCC VR759) 
were performed. Remarkably, a lower than expected similarity is found between the complete genomes and more in particular between the 
spike genes of the BE03 and BE04 strains. This finding suggests the existence of two genetically distinct HCoV-OC43 strains, circulating in 
Belgium in 2003 and 2004. Spike gene sequencing of three additional 2003 and two additional 2004 HCoV-OC43 strains, and subsequent 
phylogenetic analysis confirm this assumption. Compared to the ATCC prototype HCoV-OC43 strain, an important amino acid substitution is 
present in the potential cleavage site sequence of the spike protein of all contemporary strains, restoring the N-RRXRR-C motif, associated 
with increased spike protein cleavability in bovine coronaviruses. We here describe specific characteristics associated with circulating HCoV- 


OC43 strains, and we provide substantial evidence for the genetic variability of HCoV-OC43. 
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Introduction 


Coronaviruses (family Coronaviridae, order Nidovi- 
rales) are large, roughly spherical particles with a linear, 
non-segmented, capped, and polyadenylated positive-sense 
single-stranded RNA genome (Cavanagh, 1997; Lai and 
Holmes, 2001). The virions contain four major structural 
proteins: the nucleocapsid (N) protein, the membrane (M) 
glycoprotein, the spike (S) glycoprotein and the small 
membrane or envelope (E) protein. A hemagglutinin- 
esterase (HE) glycoprotein gene is only present in group 2 
coronaviruses, which include human coronavirus OC43 
(HCoV-OC43), bovine coronavirus (BCoV), porcine 
hemagglutinating encephalomyelitis virus (PHEV), canine 
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respiratory coronavirus (CRCoV), mouse hepatitis virus 
(MHV), rat sialodacryoadenitis virus (SDAV) and equine 
coronavirus (ECoV) (Spaan et al., 1988; Zhang et al., 1992). 

Human coronaviruses (HCoV) cause respiratory infec- 
tions but also gastroenteritis and neurological disorders can 
occur (Arbour et al., 2000; Lai and Holmes, 2001). Until 
now, five types of human coronaviruses have been 
described: HCoV-OC43, HCoV-229E, HCoV-NL63, the 
recently characterized HCoV-HKU1 and the causal agent 
of the SARS outbreak, the SARS-coronavirus (SARS-CoV). 
HCoV-OC43 (ICT Vdb code 19.0.1.0.006) and HCoV-229E 
(ICTVdb code 19.0.1.0.005) were isolated in 1967 from 
volunteers at the Common Cold Unit in Salisbury, UK. 
HCoV-OC43 was initially propagated on ciliated human 
embryonic tracheal and nasal organ cultures (OC) (MclIn- 
tosh et al., 1967). HCoV-OC43 and HCoV-229E are 
responsible for 10 to 30% of all common colds, and 
infections occur mainly during the winter and early spring 
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(Larson et al., 1980). The incubation period is 2 to 4 days. 
During the 2002-2003 winter season, a new human 
coronavirus, HCoV-NL63, was isolated in The Netherlands 
(van der Hoek et al., 2004). The frequent detection of 
HCoV-NL63 in patient samples worldwide indicates that 
HCoV-NL63 can be considered as a new important etiologic 
agent in respiratory tract infections (Arden et al., 2005; 
Bastien et al., 2005; Moés et al., 2005). Coronaviruses infect 
all age groups, and reinfections are common. The infection 
can be subclinical and is usually mild, but there have been 
reports of more severe lower respiratory tract involvement 
in infants and elderly people (Gagneur et al., 2002; Vabret et 
al., 2003). Unlike HCoV-OC43 and HCoV-229E, SARS- 
CoV and also HCoV-NL63 appear to be associated with 
more severe respiratory symptoms like pneumonia and 
bronchiolitis. Only recently, a fifth human coronavirus type, 
HCoV-HKUI1, has been discovered in two patients with 
pneumonia, and based on genomic analysis, HCoV-HKU1 
was proposed to be a distant member of group 2 
coronaviruses (Woo et al., 2005). 

In our previous study, we described the complete genome 
sequence of the HCoV-OC43 prototype strain (ATCC 
VR759), isolated in 1967 from an adult with common 
cold-like symptoms (McIntosh et al., 1967; Vijgen et al., 
2005a). We demonstrated a high rate of similarity with 
bovine coronaviruses, and we postulated that both viruses 
diverged from each other (Vijgen et al., 2005a). Molecular 
clock analysis situated the most recent common ancestor of 
both viruses around 1890. The evolutionary rate of the 
HCoV-OC43/BCoV pair was estimated in the order of 10~* 
nucleotide substitutions per site per year (Vijgen et al., 
2005a), which is in the same range as reported for other 
RNA viruses (Domingo and Holland, 1988). The prototype 
HCoV-OC43 strain is, however, a laboratory strain, that 
since its isolation in 1967 was passaged seven times in 
human embryonic tracheal organ culture, followed by 15 
passages in suckling mouse brain, and an unknown number 
of passages in human rectal tumor HRT-18 cells and/or Vero 
cells. During the passage history, it is likely that a number of 
mutations have accumulated. Recently, St-Jean et al. (2004) 
reported the complete genome sequence of a HCoV-OC43 
clinical isolate, designated Paris. A high degree of genetic 
stability was stated for HCoV-OC43 since only 6 nucleotide 
variations in the whole genome could be observed in 
comparison to the HCoV-OC43 ATCC strain, with isolation 
dates 34 years apart. Based on evolutionary analyses, we 
demonstrated that this finding seems unlikely, and we 
suggested that the HCoV-OC43 Paris isolate might have 
been a result of cross-contamination with the ATCC HCoV- 
OC43 strain (Vigen et al., 2005b). In this study, we present 
the complete genome sequence data of two contemporary 
non-cell culture adapted HCoV-OC43 strains. The HCoV- 
OC43 BEO3 strain (isolate 87309) was detected in 2003 
from a 2-year-old girl suffering from bronchiolitis, and the 
HCoV-OC43 BE04 strain (isolate 19572) was detected in 
2004 in a l-year old boy, who presented with rhinitis, 


bronchiolitis and coughing. Both patients were hospitalized 
at the University Hospital in Leuven, Belgium. Spike gene 
sequence data are also determined for three additional 2003 
and two additional 2004 HCoV-OC43 strains. Phylogenetic 
and comparative sequence analyses of these data are 
performed, providing closer insights in the HCoV-OC43 
genetic variability. 


Results 


We here present the complete nucleotide sequence of two 
contemporary HCoV-OC43 strains. The genome of both 
contemporary HCoV-OC43 strains encompasses 30723 
nucleotides (excluding the 3’ poly(A) tail), being 15 
nucleotides shorter than the genome of the ATCC prototype 
strain (GenBank accession number AY391777). Further- 
more, we here report the spike gene sequences of three 
additional 2003 HCoV-OC43 strains (HCoV-OC43 BE03 
isolates 89996, 37767 and 84020) and two additional 
HCoV-OC43 strains from 2004 (HCoV-OC43 BE04 
34364 and 36638). These sequences were deposited in the 
GenBank database under accession numbers AY903454- 
AY 903460. 

Pairwise sequence alignments demonstrate an overall 
genome similarity of 99.0% between the HCoV-OC43 
ATCC prototype strain (AY391777) and the HCoV-OC43 
BEO03 strain as well as between the prototype strain and the 
BE04 strain. Complete genome sequence comparisons to 
the HCoV-OC43 ATCC prototype strain sequenced by St- 
Jean et al. (AY585228) and the Paris isolate (AY585229) 
show similar results: 99.0% identity is found between the 
BEO3 strain and the ATCC strain (AY585228) as well as 
between the BEO3 strain and the Paris isolate. For the 
HCoV-OC43 BE04 strain 98.9% similarity with the 
complete genome sequences of both the ATCC strain 
(AY585228) and the Paris isolate is found. The genetic 
divergence between the prototype and each of the contem- 
porary HCoV-OC43 strains roughly corresponds to an 
evolutionary rate of about 2.7 x 10~* nucleotide substitu- 
tions per site per year. The 2003 and 2004 HCoV-OC43 
strains show 99.4% similarity throughout their complete 
genomes. When comparing the genome sequence data of the 
HCoV-OC43 ATCC (AY391777) and contemporary strains, 
several nucleotide deletions and insertions are observed in 
the 3’ part of the genome (Fig. 1). Some of these nucleotide 
deletions and insertions are common for both contemporary 
strains in reference to the prototype strain, while others are 
observed in only one of the strains compared to the 
prototype strain. Similarity percentages between the 
HCoV-OC43 BEO03 and BE04 strains, the ATCC prototype 
strain and the HCoV-OC43 Paris isolate are calculated for 
the major open reading frames (ORFs) and their proteins 
(Table 1). In the 2003 strain, the lowest percentage of 
identities with the other HCoV-OC43 strains is found for the 
envelope protein (E) gene and the corresponding amino acid 
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Fig. 1. Overview of the genome organisation of the ATCC prototype (GenBank accession number AY391777) and contemporary HCoV-OC43 strains. A 
stretch of nucleotides, present in the genome of one strain but not in that of another, is indicated by a triangle. The boxed regions indicate the size of the 


nucleotide stretch. 


sequence. However, the low rate of similarity is not due to a 
high number of mutations but to a 6 nucleotide deletion at 
the 3’ end of the HCoV-OC43 BEO3 E gene leading to the 
deletion of a stop codon. The ORF is thereby elongated until 
the next stop codon, which is present 14 base pairs (bp) 
downstream. The HCoV-OC43 BE03 E protein is therefore 


Table 1 

Nucleotide (and amino acid) similarity percentages of the major ORFs of 
the HCoV-OC43 ATCC prototype strain (GenBank acc. nr. AY391777), the 
Paris isolate, and the BE03 and BE04 strains 


ATCC/BE03 ATCC/BE04 Paris/BE03  Paris/BE04 BE03/BE04 


ORFla 99.4 (99.4) 99.4 (99.3) 99.4 (99.4) 99.4 (99.3) 99.8 (99.8) 
ORF1b 99.5 (99.8) 99.5 (99.7) 99.5 (99.8) 99.5 (99.7) 99.8 (99.8) 
ns2 —-99.6 (100.0) 99.8 (100.0) 99.5 (100.0) 99.6 (100.0) 99.6 (100.0) 


HE —_98.0 (97.2) 97.7 (96.5) 98.4 (97.2) 98.1 (96.5) 99.2 (98.8) 
S 97.3 (96.2) 97.0 (95.5) 97.1 (96.0) 96.5 (95.2) 97.2 (96.9) 
ns12.9 99.4 (98.2) 99.7 (99.1) 99.4 (98.2) 99.7 (99.1) 99.7 (99.1) 
E* 93.3 (94.3) 99.6 (98.8) 93.3 (94.3) 99.6 (98.8) 93.6 (95.5) 
M _ 98.8 (98.3) 98.7 (98.3) 99.0 (98.3) 98.8 (98.3) 99.6 (100.0) 
N 99.2 (98.7) 98.8 (98.2) 99.3 (98.9) 98.9 (98.4) 99.2 (98.2) 
la/Ib’ 99.2 (ND°) 99.0 (ND°) 99.4 (99.0) 99.2 (98.6) 98.9 (98.6) 


“ A deletion of a stop codon in the HCoV-OC43 BE03 E gene leads to an 
elongation of the ORF with 12 base pairs, and thus 4 amino acids. 

> The HCoV-OC43 ATCC I coding region contains a stop codon at position 
29345, resulting in two potential coding regions of 60 amino acids (Ia) and 
115 amino acids (Ib). The percentage nucleotide similarity is calculated for 
the continuous I[a/Ib region. 

~ Not done. 


4 amino acids longer than its HCoV-OC43 ATCC, BE04 
and Paris counterparts. Sequencing of the E gene of the 
other HCoV-OC43 2003 strains demonstrated the same 
elongation of the E ORF. 

The S gene is the most polymorphic ORF, with 
nucleotide (and amino acid) identity percentages of 97.3% 
(96.2%) between the ATCC prototype strain (AY391777) 
and the BEO3 strain, 97.1% (96.0%) between the Paris 
isolate and the BEO3 strain, 97.0% (95.5%) for the 
prototype strain/BE04 pair, and 96.5% (95.2%) for the 
Paris/BE04 pair. SimPlot analysis of the complete genomes 
of the HCoV-OC43 BEO3 and BE04 strains, the Paris isolate 
and the HCoV-OC43 ATCC strain (AY585228) in reference 
to the ATCC prototype strain (AY391777), also demon- 
strates for the BE03 and BE04 strains that the highest 
variability is found in the genome region containing the 
spike gene (Fig. 2). Most remarkably, the spike genes of the 
BE03 and BE04 strains, with detection dates only one year 
apart, show only 97.2% nucleotide and 96.9% amino acid 
similarity which corresponds to a total of 93 nucleotide 
variations leading to 34 amino acid changes, 15 nucleotide 
(or 5 amino acid) insertions in the BEO3 strain spike gene 
and 9 nucleotide (or 3 amino acid) insertions in the BE04 
strain spike gene. Out of 93 polymorphic nucleotides, 60 are 
located in the S1 subunit and these include 29 synonymous 
substitutions and 31 nonsynonymous substitutions leading 
to 23 amino acid changes. Nucleotide insertions and 
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Fig. 2. SimPlot analysis of complete genome sequence data of the HCoV-OC43 BE03, BE04, Paris, and ATCC (GenBank accession number AY585228) 
strains in reference to HCoV-OC43 ATCC strain GenBank accession number AY391777. Each point plotted is the percentage genetic distance within a sliding 
window of 600 bp wide, centered on the position plotted, with a step size of 100 bp. Each curve is a comparison between the genome of one of the strains and 
the reference genome of the ATCC HCoV-OC43 strain (AY391777). 


deletions are also exclusively found in the S1 subunit, which of the spike proteins of the contemporary strains also differs 
codes for the fragment of the spike glycoprotein that binds slightly between both strains and in comparison to that of 
to its cellular receptor. The putative N-glycosylation pattern the HCoV-OC43 prototype spike protein. There are 15 
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Fig. 3. Neighbor joining phylogenetic tree of the complete spike gene nucleotide sequence data of group 2 coronaviruses: HCoV-OC43 BE03 strain (GenBank 
accession numbers AY903454, AY903456, AY903457, AY903459), HCoV-OC43 BE04 strain (AY903455, AY903458, AY903460), HCoV-OC43 ATCC 
VR759 strain (AY391777, AY585228, L14643, Z21849, S62886, Z32768, Z32769), HCoV-OC43 serotype OC43 Paris (AY585229), BCoV LUN 
(AF391542), BCoV ENT (AF391541), BCoV Mebus (U00735), BCoV Quebec (AF220295), CRCoV (AY150272), PHEV strain 67N (AY078417), ECoV 
strain NC99 (AY316300), SDAV (AF207551) and MHV strain A59 (AY700211). Bootstrap values over 75% are shown. 
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potential N-glycosylation sites (Asn-Xxx-Ser/Thr) in the 
spike protein of the 2003 strain, and 17 are predicted to be 
present in the spike protein of the 2004 strain, while the 
prototype HCoV-OC43 spike protein contains 14 potential 
N-glycosylation sites (NetNGlyc 1.0 Server). 

The observed variability in the spike gene of the 2003 
and 2004 HCoV-OC43 strains is further investigated by 
sequencing this gene in three additional 2003 and two 
additional 2004 HCoV-OC43 positive samples. Among the 
strains sampled in the same year, a high degree of 
conservation is found in their spike genes: no more than 
0.15% difference between the 2003 strains, and no more 
than 0.05% between the 2004 strains. A multiple sequence 
alignment of these sequence data and spike gene sequence 
data of HCoV-OC43 and of other group 2 coronaviruses, 
available in Genbank, is performed. Based on this align- 
ment, a neighbor joining phylogenetic tree is constructed 
and evaluated by 1000 bootstrap pseudoreplicates (Fig. 3). 
In the HCoV-OC43 branch, three phylogenetic clusters can 
clearly be demonstrated: the ATCC cluster, containing all 
ATCC strains as well as the Paris isolate, a second cluster 
containing all HCoV-OC43 BE0O3 strains, and a third cluster, 
in which all HCoV-OC43 BE04 strains are found. Molecular 
clock analysis 1s performed using the spike gene sequence 
data of HCoV-OC43 and BCoV strains for which the 
sampling date is known (Table 2). A likelihood ratio test 
indicates that the molecular clock hypothesis, which 
assumes that the evolutionary rate is roughly constant 
among lineages, cannot be rejected for these serially 
sampled data (P = 0.62). Using a Bayesian coalescent 
approach, the time to the most recent common ancestor 
(TMRCA) of the HCoV-OC43 2003 and 2004 strains is 
estimated in 1971 with a 95% highest posterior density 
interval of 1962—1979. The evolutionary rate is estimated to 


Table 2 
Sampling data of bovine and human coronaviruses to calculate the TMRCA 
Strain GenBank Sampling 
accession nr. date 
HCoV-OC43 ATCC AY391777 1967 
HCoV-OC43 ATCC AY585228 1967 
HCoV-OC43 ATCC Z21849 1967 
HCoV-OC43 BEO03 isolate 37767 AY903457 2003 
HCoV-OC43 BEO03 isolate 84020 AY903456 2003 
HCoV-OC43 BEO03 isolate 87309 AY903459 2003 
HCoV-OC43 BE03 isolate 89996 AY903454 2003 
HCoV-OC43 BE04 isolate 19572 AY 903460 2004 
HCoV-OC43 BE04 isolate 34364 AY903455 2004 
HCoV-OC43 BE04 isolate 36638 AY903458 2004 
BCoV Mebus U00735 1972 
BCoV Quebec AF220295 1972 
BCoV LUN AF391542 1998 
BCoV ENT AF391541 1998 
BCoV-LSU94 AF058943 1994 
BCoV M80844 M80844 1989 
BCoV-LY 138 AF058942 1965 
BCoV BECS D00731 1979 


be 3.5x 10“ substitutions per site per year (95% highest 
posterior density interval of 2.6 x 10°-*—4.5 x 107°). 

In all 2003 and 2004 contemporary HCoV-OC43 strains, 
an important amino acid change, in reference to the ATCC 
strains, 1s present in the proteolytic cleavage site of the 
HCoV-OC43 S protein. Unlike the HCoV-OC43 ATCC 
prototype strain, in which this sequence is RRSRG, the 
contemporary strains have a G to R amino acid change in 
the last position, leading to an RRSRR motif. 

In the HCoV-OC43 ATCC prototype strain, the internal 
ORF (I) coding region contains a stop codon at position 
29345, resulting in two potential coding regions of 60 amino 
acids and 115 amino acids. This stop codon is not present in 
the BE03 and BE04 strains, which have the capacity to code 
for a 207-amino acid protein. 


Discussion 


Human coronaviruses are an important cause of respira- 
tory tract infections. Before the SARS outbreak coronavi- 
ruses have been somewhat neglected in human medicine, 
although they have always played a significant role in 
animal virology. Only recently, we reported the complete 
genome sequence of the prototype HCoV-OC43 § strain 
(ATCC VR759; GenBank accession number AY391777) 
(Vijygen et al., 2005a). This strain, however, has been 
passaged several times in organ culture, suckling mouse 
brain and cell culture, and the presence of culture-related 
polymorphisms is likely. In this study, we screened a 
collection of RSV-negative bronchiolitis samples in order to 
identify contemporary HCoV-OC43 strains and we deter- 
mined the full-length genome sequence of two circulating 
strains (HCoV-OC43 BE03 and BE04) detected in 2003 and 
2004, respectively. When comparing the complete genome 
sequence of the contemporary strains to that of the ATCC 
prototype strain and to the HCoV-OC43 Paris isolate, we 
found similarity percentages of 98.9% to 99.0%. The 
genetic divergence between each of the circulating strains 
and the ATCC strain corresponds roughly to an evolutionary 
rate of about 2.7 x 10 * nucleotide substitutions per site per 
year. This value is consistent with the previously estimated 
evolutionary rate of the BCoV/HCoV-OC43 pair, and that of 
other coronaviruses (TGEV and SARS-CoV) (Salemi et al., 
2004; Sanchez et al., 1992; Vijgen et al., 2005a). 

In reference to the prototype strain, in frame nucleotide 
insertions and deletions are present in the HE, S and E genes 
of the contemporary strains, leading to variability in their 
predicted protein lengths. HE, S and E proteins are 
expressed at the surface of the virus particles and host 
immune pressures can attribute to the occurrence of 
variations, insertions and deletions in their genes. Deletion 
of a stop codon at the 3’ end of the HCoV-OC43 E gene of 
the BEO03 strain elongates this ORF with 12 bp or 4 amino 
acids. The same elongation of the E ORF by deletion of a 
stop codon is also present in the other HCoV-OC43 strains 
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from 2003, but not in the 2004 samples, indicating that this 
phenomenon might be characteristic for the HCoV-OC43 
strain circulating in Belgium in 2003. These four additional 
amino acids (IQTL) are present at the carboxyterminus, the 
part of the E protein that is located inside the virion. 
Whether this E protein elongation has a functional 
significance remains to be elucidated. The spike gene and 
protein are the most polymorphic when comparing the 
major ORFs of the prototype and contemporary HCoV- 
OC43 strains. The spike glycoprotein is the major corona- 
virus antigen and plays an important role in attachment of 
the virus to cell surface receptors and induces the fusion of 
viral and cellular membranes (Spaan et al., 1988). Host 
immunity escape mechanisms, variation in host range and 
tissue tropism of coronaviruses are largely attributed to 
variations in the spike glycoprotein (Gallagher and Buch- 
meier, 2001). Interestingly, the similarity between the spike 
genes and proteins of the 2003 and 2004 circulating HCoV- 
OC43 strains is lower (97.2% and 96.9%, respectively) than 
would be expected based on the HCoV-OC43 evolutionary 
rate. The observed spike gene variability provides evidence 
for the existence of several genetically distinct HCoV-OC43 
strains with different temporal and possible geographical 
circulation patterns. Spike gene sequencing of the additional 
2003 and 2004 HCoV-OC43 strains reveals more character- 
istics of the HCoV-OC43 strains circulating in 2003 and 
2004. All BEO3 strains have a 4092-nt spike gene coding for 
a 1363-amino acid protein, while the spike gene of all BE04 
strains is six nucleotides shorter encoding a 1361l-amino 
acid protein. The HCoV-OC43 BE04 spike gene is of the 
same length as the HCoV-OC43 ATCC prototype strain 
spike gene, but shows a different nucleotide insertion and 
deletion pattern. Strain-specific nucleotide variations, inser- 
tions and deletions are present among all samples of the 
same year of detection. Phylogenetic analysis of spike gene 
sequences of these four Belgian BE03 and three Belgian 
BE04 HCoV-OC43 strains, of the HCoV-OC43 Paris isolate 
spike gene, and of all ATCC strain spike sequence data, 
confirms the existence of genetically different HCoV-OC43 
strains. Three distinct phylogenetic clusters can be demon- 
strated: the ATCC cluster, containing all ATCC laboratory 
strains as well as the Paris isolate, a second cluster, 
containing all HCoV-OC43 BE03 strains, and a third 
cluster, in which all HCoV-OC43 BE04 strains are found 
(Fig. 3). The HCoV-OC43 Paris isolate clusters with all 
ATCC strains and not with the contemporary strains, 
implicating that this isolate might not be a circulating 
strain, but rather a result of cross-contamination with the 
ATCC HCoV-OC43 strain (Vijgen et al., 2005b). 

At a certain time in history the HCoV-OC43 2003 and 
2004 circulating strains diverged from each other, and based 
on molecular clock analysis of spike gene sequence data, 
their most recent common ancestor can be dated back to 
1971 (95% highest posterior density interval: 1962—1979). 
According to this molecular clock model, an evolutionary 
rate of 3.5 x 10-* nucleotide substitutions per site per year 


is estimated (95% highest posterior density interval: 
2.6 x 10-*—4.5 x 107“), a value consistent with our pre- 
vious findings (Vijgen et al., 2005a). The existence of two 
different HCoV-OC43 strains circulating in Belgium in 
2003 and in 2004 is also supported by the difference in 
length of the E protein, which is four amino acids longer in 
the BE03 samples, while this elongation of the E ORF is not 
present in the BEO4 samples. 

Cleavage of the coronavirus spike protein into the 
subunits Sl and S2 is mediated by cellular trypsin-like 
proteases acting at the C-terminus of the sequence N-Arg- 
Arg-Xxx-Arg-Arg-C (Abraham et al., 1990). This cleavage 
process is believed to play an important, although not 
obligatory role in the fusion activity and viral infectivity of 
BCoV and MHV (Stauber et al., 1993; Storz et al., 1981). In 
all BEO3 and BE04 HCoV-OC43 strains, a glycine to 
arginine variation compared to all spike protein sequences 
of the ATCC HCoV-OC43 strain and the Paris isolate is 
present in this proteolytic cleavage site. This observation 
might have possible important functional consequences, as 
the cleavage site sequence is RRSRG in the ATCC strains 
and these have been reported to have an uncleaved spike 
protein (Hogue and Brian, 1986; Kiinkel and Herrler, 1993). 
All 2003 and 2004 contemporary strains, however, have a G 
to R amino acid change in the last position, leading to an 
RRSRR motif, identical to that of BCoV-Mebus, which 
therefore might lead to an increased cleavability compared 
to the ATCC prototype HCoV-OC43 spike protein. This 
amino acid change is also present in two HCoV-OC43 
strains described by Kunkel and Herrler (1996) (OC43-CU 
and OC43-VA), which have been shown to be cleaved to an 
extent of nearly 40% in infected cells. Analysis of the spike 
gene of these two strains in comparison to the ATCC 
prototype HCoV-OC43 strains and to BCoV-Mebus reveals 
a closer relationship to BCoV than to the ATCC HCoV- 
OC43 strains. 

Another observation when comparing the HCoV-OC43 
contemporary and prototype strains is present in the internal 
(1) ORF, localized within the nucleocapsid gene. In both 
contemporary HCoV-OC43 strains, the I gene encodes a 
207-amino acid protein, as is also observed in BCoV, PHEV 
and MHV (Fischer et al., 1997; Lapps et al., 1987). In the 
HCoV-OC43 prototype strain, however, an early stop codon 
is present in the I coding region leading to two potential 
coding regions of 60 amino acids and 115 amino acids. The 
occurrence of this early stop codon is due to a single 
nucleotide variation, which probably occurred during cell 
culture passaging. 

In this study, we performed a comparative genomic 
analysis of two circulating non-cell culture adapted HCoV- 
OC43 strains and the cell cultured prototype strain. 
Furthermore, based on spike gene sequence data of four 
2003 and three 2004 strains, we demonstrate the circulation 
of two genetically different HCoV-OC43 strains in Belgium 
in 2003 and 2004, respectively. We provide substantial 
evidence for the genetic variability of HCoV-OC43, 
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supporting our previous conclusions based on evolutionary 
analyses (Vijgen et al., 2005b). 


Materials and methods 
Screening for contemporary coronavirus strains 


A collection of respiratory tract specimens from neonates 
and infants hospitalized with respiratory syncytial virus 
(RSV)-negative respiratory tract infections was screened 
using coronavirus consensus primers amplifying a 251-bp 
amplicon in a RT-PCR reaction (Moés et al., 2005). No prior 
amplification by cell culture was performed. Samples found 
positive were sequenced in both directions using the RT- 
PCR primer set. 


Sequencing of the contemporary HCoV-OC43 genomes 


To determine the genomic sequence of the contemporary 
HCoV-OC43 strains, a set of overlapping RT-PCR products 
with an average size of 1.5 kb encompassing the entire 
genome were generated. For both RT-PCR and sequencing, 
oligonucleotide primers were designed in regions that were 
conserved between the BCoV and MHV genomes. The 
forward PCR primer in the 5’-terminal sequence (OC43F1: 
5’-GATTGTGAGCGATTTGC-3’) was based on the HCoV- 
OC43 S’UTR partial sequence (Wu, H.Y., Guy, J.S., Yoo, 
D., Vlasak, R. and Brian, D.A., University of Tennessee, 
Knoxville, TN, unpublished; GenBank accession number 
AF523847). To generate RT-PCR products containing the 
exact 3’-terminal sequence, we used oligonucleotide 
OC43R74 (5’-TTTTTTTTTTGTGATTCTTCCA-3’) based 
on the conserved 3’-end sequence of all known group-2 
coronaviruses. Using 150 sequencing primers, sequencing 
in both directions was performed on an ABI Prism 3100 
Genetic Analyzer (Applied Biosystems, Foster City, CA, 
USA) using the BigDye terminator v3.1 cycle sequencing 
kit. Chromatogram sequencing files were inspected with 
Chromas 2.2 (Technelysium, Helensvale, Australia), and 
contigs were prepared using SeqMan II (DNASTAR, 
Madison, WI). 


DNA sequence submission 


The nucleotide sequence data reported in this paper were 
deposited in GenBank using the National Center for 
Biotechnology Information (NCBI, Bethesda, MD) Banklt 
v3.0 and Sequin v5.35 submission tools under accession 
numbers AY903454—AY903460. 


DNA and protein sequence analysis 
DNA and protein similarity searches were performed 


using the NCBI BLAST (Basic local alignment search tool) 
server on GenBank DNA database release 118.0 (Altschul et 


al., 1990). Pairwise nucleotide and protein sequence align- 
ments were performed using FASTA algorithms in the 
ALIGN program on the GENESTREAM network server 
(http://www2.igh.cnrs.fr) at the Institut de Génétique 
Humaine in Montpellier, France (Pearson et al., 1997). 
Multiple sequence alignments were prepared using CLUS- 
TALX version 1.82 (Thompson et al., 1997), and manually 
edited in GENEDOC (Nicholas et al., 1997). The SimPlot 
program version 3.2 was used to plot the genetic distance 
between two HCoV-OC43 strains versus nucleotide posi- 
tions (Lole et al., 1999). Phylogenetic analyses were 
conducted using MEGA version 2.1 (Kumar et al., 2001). 
Potential N-glycosylation sites in the HCoV-OC43 spike 
proteins were predicted using the CBS NetNGlyc 1.0 Server 
(http://www.cbs.dtu.dk/services/NetNGlyc/). 


Timing the most recent common ancestor 


The relationship between sampling date and genetic 
divergence was investigated using a linear regression, based 
on a maximum likelihood tree, as implemented in the Path-O- 
Gen software, kindly provided by Andrew Rambaut (Uni- 
versity of Oxford, UK). The molecular clock hypothesis was 
tested using a likelihood ratio test that evaluates the relative 
goodness-of-fit of a model assuming a molecular clock for 
serially-sampled data compared to a model that does not 
assume rate constancy (Rambaut, 2000). Evolutionary rates 
and divergence times were estimated using Bayesian 
inference in BEAST v1.03 (Drummond, A., Rambaut, A. 
BEAST v1.0, available from http://evolve.zoo.ox.ac.uk/ 
beast/, Drummond et al., 2002). Markov chain Monte Carlo 
(MCMC) inferences were made under a constant population 
size demographic function. A chain was run for 10 x 10° 
generations and sampled every 1000th generation after burn- 
in (10%). 
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